Redundancy Estimates for Word-Based Encoding of 
Sequences Produced by a Bernoulli Source* 



G. L. Khodak 



ON 



Abstract 

^3 The efficiency of a code is estimated by its redundancy R, while the complexity of a code 

is estimated by its average delay N. In this work we construct word-based codes, for which 
R < N 5//3 . Therefore, word-based codes can attain the same redundancy as block-codes while 
O . being much less complex. 

We also consider uniform on the output codes, the benefit of which is the lack of a running 
synchronization error. For such codes iV _1 < R < iV _1 , except for a case when all input symbols 
are equiprobable, when R N~ 2 for infinitely many N. 

1 Introduction 

-H ■ 

c/3 , Consider a Bernoulli source sequentially producing symbols from an input alphabet ai,...,a m 

(2 ^ m < oo) with probabilities pi, ■ ■ ■ ,p m , i Pi = 1> Pi > (i = 1, . . . ,m). The entropy of 
the source H = — ^2 i= iPilog 2 pi- Assume that a message is an infinite- length sequence of symbols 
from the input alphabet cii k ( ^ =1 . It is necessary to map such a message to a sequence of symbols from 
an output alphabet bi, . . . , b n (2 n < oo), which is its code. Such a mapping can be established by 
using word-based codes. Select a finite set of words Aj (j = 1,2,...) from the input alphabet, such 
that any message can be uniquely represented by a sequence of such words (indeed, this immediately 
implies that words Aj are prefix free; i.e. no word is a prefix of another). In turn, words Aj are 
, represented by words (f>(Aj) from the output alphabet. A word-based code for a given message is 

' constructed as follows: 

o : = M>r=i - {<H^v)}r =1 = (u^i • 

In this paper, we only consider decipherable encodings, i.e. ones such that (j) iAix ) ^ (Ai% ) - • • 
<j) (A is ) = <f) {Aj 1 ) 4> (Aj 2 ) . . . <f) (A it ) always implies that s = t and (f> (A ik ) = (Aj k ), k — 1, . . . , s. 
Constructed codes have, in fact, an even more strong property, namely that different messages have 
different codes. 

In the terminology of V. I. Levenstein [T], word-based code is specified by a coding system 
{A, U, B,V}, where A is the input alphabet, B is the output alphabet, U is the set of words Aj, 
V = <p (Aj), and it is required that U is strongly (prefix-) free, and that any message begins with a 
word in U . The number of letters in a word A (i.e. its length) is denoted by |j4|. The code is called 
a block-code, or uniform on the input code, if all words Aj have the same length. The code is called 
uniform on the output code, if all words <f> (Aj) have the same length. 

The probability of a word A — . . . ai k in the input alphabet is denoted by p (A) . For Bernoulli 
source p (A) = p n . . .p ik . 

The complexity of a code is estimated by using its delays: average N — ^jP(Aj) \ Aj\, and 
maximum N = maxj \Aj\. For block codes \Aj\ = N = N (j = 1, . . . , m n ). 

The efficiency of a code is estimated by using its redundancy: R = N^ 1 J2jP(Aj) \4>(Aj)\ — 
Hlogz 1 n. C Shannon has shown that < R ^ N^ 1 [2\. From the paper of V. M. Sidelnikov [3] 



'Translation from Russian original: "Ocenki izbytochnosti pri poslovnom kodirovanii soobscheniy, porojdaemyh 
bcrnullicvskim istochnikom" , Problemy Peredachi Informacii (Problems of Information Transmission), 8 (2) (1972) 21- 
32. Translated by Yuriy A. Reznik, yreznikOieee . org. 
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it follows that for all word-based codes R ^ 0. The redundancy shows how the average number 
of output letters per each input letter is greater than the minimum necessary. Note, that both 
redundancy and average delay are continuous functions of probabilities of symbols pi, . ■ ■ ,p m - 

R. E. Krichevski [J] has shown that for optimal block-codes R > TV -1 (TV — > oo), except for the 
sources with coinciding fractional parts of log„piQ- In the present paper, we construct word-based 
codes, for which R < TV 5 / 3 , TV < TV log TV. Compared with block codes of the same redundancy our 
codes are much less complex. It is proven, that for almost all Bernoulli sources (we apply Lebesgue 
measure on points (pi, . . . ,p m -i)) word-based codes satisfy: R > TV" 9 log -8 N. 

Word based codes are susceptible to running synchronization errors, i.e. a single error in an 
encoded message i, may result in incorrect separation of words <j)(Aj) in an arbitrary large 

portion of the code, resulting in an arbitrary large number of errors in the reconstructed message. 
Uniform on the output codes have an advantage that the corresponding error in the reconstruction 
is limited to a single word Aj. We construct uniform on the output codes, for which R < TV -1 . It 
is proven, that if not all pi — 1/m (i = 1, . . . , m), then R > TV -1 . If pi = . . . = p m = 1/m, then for 
infinitely many positive integer TV: R < TV~ 2 . 



2 Relation between redundancy and lengths of words (f) (Aj) 

From the paper of V. M. Sidelnikov [3] it follows that H = -N~ 1 Y^ j p(A j )log 2 p(A j ). Using this 
equation we arrive at: 

R = N- 1 Y,P(A J )(\<f>(A ] )\+log nP (A ] )). (1) 

3 

We introduce the following notation: 

8 = l-^V'^l (2) 

3 

€j = |0(^)| + log n p(A i ) (i = 1,2,...) (3) 

( -1 if ej < -1, 

e'j = { ej if kil < 1, (4) 
{ 1 if Sj > 1. 

It is well known (see, e.g. [5]), that the necessary and sufficient condition for the existence of a 
decipherable code with lengths of codewords \<f> {Af)\ (j = 1,2,...) is given by the Kraft inequality 

Theorem 1. The redundancy of a decipherable code satisfies: 

R^N- 1 \ Sin' 1 8 + ^-Inn^piAj) s'; 

3 

If \Sj \ ^ 1 for all j = 1,2, . . ., then 

R < TV" 1 I 51n~ 1 *+-lnn5^p(A i )e 
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Proof. Decompose n £j in a Taylor series (j = 1, 2, . . .), 

n £j = 1 — Ej Inn + r\ (sj) . (5) 



1 Here, as usual, the notation / > g moans that lim L > 0. If / > 0, then there exists a constant c > 0, such that 
for all arguments / > eg. Assuming the existence of such an inequality, we, in some instances, may not specify the 
direction of growth of the argument. 
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The remainder 

r\ (ej) = n Ej — 1 + Sj In ft . (6) 
From the sign of jf-?7 (£j) it follows, that 

r?(4) <r?( £i ) ■ (7) 
Since e'j e [—1,1], the Lagrange estimate for the remainder is 

ln 2 n l2 , ,s n\n 2 n , 2 



By multiplying (|5"]) by p (Aj) and summing all terms over j, we obtain 

J2p(Aj)n- e J = £p(A i )(l - Sj Inn + r?( £j )) . (9) 



From ([3]) we have 
and from Q and Q 



p(i4 i )»~ e ' -» H * (Ai)l . ( 10 ) 

Y,P{A 3 )e 3 =RN . (11) 

i 

From ©, ([ MIT]) it follows, that 

NR = ]n- 1 nL + J2p(A j )r 1 (e j )\ . (12) 

The statement of the theorem follows from , © , and ([12"]) . □ 

By ||x|| we denote the distance of real number x to its nearest integer. 
Corollary 1. The following inequality holds 

i 

This follows from the first claim of the Theorem 1, Kraft inequality, and an observation that 
|4|>lb(^)||. 

3 On approximation of linear forms by integer numbers 

From Theorem 1 and the Corollary it follows that the redundancy (of a word-based code) depends 
on quantities \\p (Aj)\\. If fej is a number of letters in a word A, then \og n p(A) — X)t=i ^» ^°S n Pi 
is a linear form of fc$. 

Consider an arbitrary linear form / (fci, . . . , fc m ) = X)i=i kjdj, where coefficients di are fixed, and 
fcj (i = 1, . . . , m) are integer numbers. 

By [x] and {x} we denote the integer and fractional parts of a real number x correspondingly; 
= min ({&}, 1 — {x}). We will also need the following obvious relationships (x, j/ are real 
numbers, I is an integer): 

{x + l} = {x}, (13) 
{x + y} < {x} + {y} , (14) 
if {x} ^ {y}, then {x - y} = {x} - {y} . (15) 
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Lemma 1. If d m is irrational, then there exists infinitely many integers T, such that for any vector 
(fci, . . . , k m ) there exist numbers k' m and k m , ^ k' m < T, ^ k'^ < T, such that: 

{f{k u ...,k m + k' m )} < 2/T, 
l-{f(k u ...,k m + k m )} < 2/T. 

Proof. For T we pick a denominator of any fraction giving the best approximation to d m , except for 
the first one [51 Chapter 1, § 2, p. 2]. Let T be a denominator of the preceding fraction. It has been 
shown in [HI Chapter 1, § 2, p. 3], that either one of the following two statements holds 

{f d m } < T- 1 and 1 - {Td m } sC T" 1 , (16) 
1 - jf d m | T" 1 and {Td m } < T -1 . (17) 



Our proof is the same in both cases. So, for simplicity, assume that the correct statement is (|16p . 

We prove the existence of k' m (the existence of k!^ can be proven in the same way). Take an 
arbitrary vector (fci, . . . , k m ). Since d m is irrational, then there exists k such that 

{/(fci,...,fc ro _i,fc)}^2/T (18) 
(see (6[ Chapter 4, § 3]). Let us prove that 

{/(fci,...,fc m -i,fc + T)} < 2/T or {/(fc^.-.^^^fe + r)} <2/T, (19) 



and also 
If 



{/(fci,...,A m _i,A-r)}<2/Tor {/ (fci, . . . , fc ro _i, fe - t)} < 2/T . (20) 



{/(fci s ...,fc m _i,fc)}<l/T, (21) 
then, from (IB"]). ijTi]). and (Jl5j) it follows that 

|/ ffci, . . . , k m -i, k + Tj \ = {/ (fci, . . . , fc m _i, fc) + Td m j 

<{/(Al s ... ) fc ro -i,fe)}+{fd m } s^2/T. (22) 
At the same time, if ([2~Tj) is false, then from (p~8|) we have 

l/T<{/(fci,... s fc m _i,fc)}<2/T. 
In this case, from (|16p. ([13]), and (fTS)) it follows that 

{/ (fcr, . . . , fc m _r, + T)} = {/ . . . , k m -i,k) - (1 - {Td m })} < 2/T . (23) 

From and (p?3")) follows Q19p . Statement ([20")) can be proven in the same way. 

Based on (|19l) and ([20]) it is clear that for every k satisfying condition (TT5|) there exist smaller and 
greater numbers at the distance not exceeding T (and not lesser than 1) that also satisfy condition 
(|18p . Therefore, k m lies within some pair of such numbers, with distance (between these numbers) 
not larger than T, which proves the lemma. □ 



4 Estimates of the average and maximal lengths of words in 
some sets 

Hereafter, unless the contrary is stated, we assume that words are taken from an input alphabet 
{ai, . . . , a rn }. In this section, we obtain an estimate for the average length and cumulative probability 



4 



of words of sufficiently large lengths for a given selection of words in a set, conforming, in particular, 
conditions of Lemma 1. Proofs of these estimates are omitted, but they can be easily reconstructed 
by using the statements and the order of lemmas in this section. 

By k(A) we denote a vector (fci, . . . , k m ), where each coordinate h is the number of letters 
in a word A. We call such a vector k(A) a profile of the word A. Let also t(A) — ^i- By 

definition of word length \A\ = J2iLi an d by definition of probability p (A) — p^ 1 . . .p%p. 

By A' A" we denote a result of catenation of words A' and A". In accordance with definitions: 

k{A'A") = k(A') + k(A"), 

t(A'A") = t(A')+t(A"), 

\A'A"\ = \A'\ + \A"\, 

p(A'A") = P (A')p(A"). 



Assume that a set of all words contains also an empty word, A. For such a word: k(X) = (0, . . . , 0), 
p(X) = 1, and for any words A: A A = A A = A. 

In what follows, all numbers, except for probabilities of symbols, and constants in estimates of 
(ci, . . . , c m ), are assumed to be non-negative integers. 

Each set 971 of vectors (fci, . . . , k m ) can be associated with a set of words M. Suppose that 
A G M if and only if k(A) G 97T, and A cannot be decomposed into A' A", such that k (A') G 97T, 
and A" ^ A. I.e. M is a prefix-free set. 

Lemma 2. Given any set 97t and a word A, if k(A) 6 Wl, then A can be presented as A' A", where 
A' G M. 

Condition 1. We say that a set 971 of vectors (fci, . . . , k m ) satisfies Condition 1 with parameter T, 
if for each s ^ 1 and each vector (fci, . . . , fc m ), such that k* = s ^ 2 ' there exists k' m , such that 

< k' m < T and (fci, . . . , fc m _i, k' m + k m ) G 971. 

By F{D) we denote a set of words A = . . . ai r , such that ^ a m , and t(A) = D. Let also 
F(0) = A. It is clear that F(D) is a prefix-free set. 

Lemma 3. For any Di ^ 1, such that J^i Di — D, any word A G F(D) has a unique decomposition 
into AiA 2 ...Ai..., where A { G (i = 1,2,. . .). 

Lemma 4. Let D ^ 1 and A = . . . a» r G F(D). If = a m , then a^ 2 . . . ai r G F(D) and vice 
verse. If a il ^ a m , then a i2 . . . a ir G F(D — 1) and vice verse. 

Lemma 5. If M is a prefix-free set, then for any word A' 

E p(^)<i- 

A:A'AeM 

Lemma 6. For all D ^ 1 

E p(^) = i- 

AeF(D) 

Lemma 7. TTiere ermfc a constant c\ > 0, such that for each 97t, that satisfies Condition 1 with 
parameter T, any s, and any word A 1 G F (s T 2 ) , /iok?s 

E P{A) > ciT" 1 

AGF(T 2 ) 

fc (A' A) G 971. 

By Fi(D,M) denote a set of words A G F(D), which cannot be decomposed into A' A", where 
A' e M, and A" ^ A. 
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Lemma 8. For any Wl, satisfying Condition 1 with parameter T , and any s > 1, the following holds 

£ p(A) ^l-cr 1 ) 8 , 

Aef\(sT 2 ,M) 

w/iere ci is a constant, existence of which is guaranteed by Lemma 7. 
Lemma 9. The following holds: 

AdF(D) Pm 

The main result in this section is given by the following lemma. 
Lemma 10. For any 9Jt, satisfying Condition 1 with parameter T, the following holds: 

J2p(A)\A\<T* , (T-oo). 
AeM 

Now, given a fixed number T, we would like to find out how to select the minimum length T 2 
of words, such that their combined probability is sufficiently small. Such a result will be needed for 
estimating the maximum delay of the code. 

Lemma 11. There exists T 2 = T 2 (T), such that 

T 2 <T 3 \nT, (T->oo), 
and for any Wl, satisfying Condition 1 with parameter T, the following holds 

P(A)<T~\ (T-oo). 

AeM, \A\^T 2 

5 Construction of the code 

As we pointed out in Section 1 , in order to construct a (word-based) code one needs to specify a set 
of words Aj (j = 1,2,.. .), such that any incoming message can be uniquely represented by them. 
In addition, words Aj need to be mapped to output words </> (Aj ) , such that the resulting code is 
decipherable. Hereafter, we assume that all words are not empty. 

Let M' and M" be some sets of words. By M' A M" we denote a prefix-free extension of M' 
by words from M". In other words, M' A M" is a set of words from M' U M", which cannot be 
presented as A A", where A e M'UM", and A" is not empty. It is clear, that M' AM" is prefix-free 
and that the operation A is commutative and associative. If M is prefix- free, then M A M — M. 

Lemma 12. // any message begins with a word from M' , then it can also be uniquely represented 
by words from M' A M" , with any extension set M" . 

The proof follows from the definition of the operation A. 

Theorem 2. For any Bernoulli source and infinitely many T there exist decipherable codes such 
that 

R < N -1 T~ 2 , N < T 3 , N<T 3 \nT (T — > oo) . 

Proof, a) First, consider a case when not all log n pi (i = 1, . . . , m) are rational. With no loss of 
generality, we can assume that the last such a number log„ p m is irrational. 

Let T be one of the numbers satisfying conditions of Lemma 1 for a linear form — YllLi l°gn Pit 
and T 2 = T 2 (T) a number, satisfying conditions of Lemma 11. 
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Consider a set 971 1 I TI2 ) of vectors (fci, . . . , k m ), such that 



According to Lemma 1, the sets 9Jli and 2H2 are not empty, and satisfy the Condition 1 with 
parameter T. Let: 



J2h = T 2 



m l ^m l u{ (fci,...,fe m ) i 

The sets SDTj and 2H2 also satisfy the Condition 1 with parameter T. Let Mi and M2 be the sets 
of words that are associated with the sets of vectors 9Jti and DJI2 correspondingly (see Section 4 for 
details). Let {ai k }^ =1 be some message. Then k (a^ . . -cti T2 ) £ fflli, and according to Lemma 2, 
such a message begins with some word in M, (i = 1,2). Therefore, for any A 6 Mi U M2 



According to Lemma 5 



From Lemma 11 and J24 



Let us now define 



If 



\A\ < T 2 . (24) 
53 P(A)|A|<T 3 (i = l,2). (25) 



53 p(A)<T- 2 , (i = l,2). (26) 

AeMj, \A\=T 2 



. [- log„p (A)] , if A e Mi, A g M 2 , . . 

' w 1 [-log„p(A)] + l , if A e M 2 . 1 J 



53 n- /(A) < 1 , (28) 

then words can be taken from Mi. Lemma 12 ensures that Mi = Mi A Mi has the required 
properties. 



Let 



From (J271) it follows, that 



J2 n~ KA) > 1 • (29) 



AeMi 



53 n-'W <1. (30) 



asm, 



We will assume that 



53 n-'^Ul. (31) 

In the contrary is true, we can simply exchange positions of Mi and M2 in the following construction 
procedure. Let us enumerate words in M2, M2 = {A s , s = 1,2, . . .}. Consider 

g(k)= 53 „-'W. 

A£MiA(Uj =1 ^ s ) 

Due to (|2"5)) and ((3U) there exists fco, such that 

ff(fco-l) > l>5(fco) • (32) 
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Based on (|27|l for any k 

g(k-l)-g(k)^np(A k ) . (33) 

Since \A k \ ^ T for any k, then 



p (A ) < I max p, 

v 7 \l<l<m 



(34) 



We will take words Aj from Mi A ^Us=i ■ The uniqueness of the representation is guaranteed 
by Lemma 12. 

If (2gJ) holds, then from (271 it follows, that for T > 4 

0<l-$>-^)< ^ P^)= £ P(^0 ■ ( 35 ) 

3 3-A 3 eM 2 j:\A,\=T 2 

Using (f2l))) and ([33)1 we obtain 

osc i~j2 n ~ l{Aj) < T ~ 2 ■ ( 36 ) 



If (|2"9")l holds, then using (f3"2")) we also arrive at 

Observe that (|36[) is a Kraft inequality for a coding system with code lengths {l(Aj)}. This 
means, that there exists a decipherable prefix code with |0(A,)| = I (Aj) (i = 1,2,.. .) (see [7J). 
The redundancy of such a code provides an upper bound for the redundancy of the optimal one, 
which can be found by using Huffman technique [7j. 

From it follows that for any j \e 3 \ < 1 (see §2]). If \Aj\ < T 2 , then k (Aj) G ®ti u5rt 2 , 
and therefore, due to (f2"T|) 

1%-] < f • (37) 

From (|26[) we have 

£ p(Aj)< E p(A0+ £ p(^-)<r- 2 . (38) 

j:\Aj\=T 2 AEMi:\A\=T 2 AeM 2 :\A\=T 2 

From (J37J) and (J35J) we obtain 

Ep^)^<T" 2 . (39) 

3 

From (|36p . (refeq:39), and the second claim of the Theorem 1, it follows that 

R < N~ 1 T~ 2 , (40) 

while from (f2"5| it follows that 

^ ^ £ pO 4 ) 1^1 + £ p ( A ^ \ A \ ~ T3 • ( 41 ) 

AeMi A£M 2 

According to Lemma 11 

N^T 2 <T 3 lnT. (42) 

This completes the proof of the Theorem for the irrational log n p m case. 

b) All log n pi (i = 1, . . . , m) are rational. We use the same techniques and ideas as in the previous 
case. However, here it is possible to prove an even stronger statement, namely that the redundancy 
can be made arbitrary small using a constrained average delay, and that it decays exponentially 
with the growth of the maximum delay. □ 

Corollary 2. The estimate R < TV 5 / 3 holds. This follows immediately from the first two inequalities 
in the proof of Theorem 2. 
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Corollary 3. For infinitely many T there exist codes such that 



R < N 5/3 , N < NlnN . 

Proof. Consider a case a) first. In order to construct a code we select words A such that their t(A) 
are multiple of T 2 , and vectors of compositions of different words, say (fci, . . . , k m ) and (k[, . . . , k' m ), 
are either the same, or \k m — k' m \ ^ 1/3T (but the Condition 1 still holds). Then all claims of 
Theorem 2 remain correct, but, at the same time N >T 3 . This fact, combined with (|42[) leads to 
an expression claimed by this Corollary. The proof of the case b) is obtained in essentially the same 
way. □ 

In conclusion, we provide a very simple example of construction of such a code. We deal with 
an input alphabet {a, b}, probabilities p (a) = 0.4, p(b) = 0.6, entropy H = 0.971, and output 
alphabet 0,1. We have a case a). The corresponding linear form /(fci,fc 2 ) = 1.322 k\ + 0.737 fc 2 . 
For simplicity, instead of searching for the denominators of all suitable fractions, we will directly 
specify the accuracy of the approximation of /(fci,fc 2 ) by integer numbers (the accuracy used for 
code construction in Theorem 2 is 2/T). 

M% a baa bab bba bbb 

M% bba bbb ab ba aaa aab 

1(A) 1 32333344 

Let the accuracy be 0.3. In OJli we include all non-zero vectors (fci, A^), such that {/ (fci, fc 2 )} < 0.3, 
while in 5tt 2 we include vectors, such that {/ (fci, fc 2 )} > 0.7. Thus (1, 0) e M%, (1, 1) 6 9Jt 2 , while 
vectors (0,1), (2,0), (0,2) belong to neither of these sets. Let T 2 = 3. Now we can find Mi, M 2 , 
and 1(A). We obtain 

y 2-w = - > i , y 2-w = 5 < i . 

AeMi AeAi 2 

We also have Mi A M 2 = {a, ba, bba, bbb}, Xm£MiAM 2 2_ ' (a) = | < 1. Therefore Mi has to be 
sequentially combined with words from M 2 , but the only non-trivial extension is a word ba, since 
the other words in M 2 are either present in Mi already, or are extensions of the word a € M\. So, 
in our case Mi A {ba} = Mi A M 2 . We use Mi A M 2 as words Aj. Codewords (Aj) can be found 
using Huffman technique: 

a^0, ba^ 10, bba ->110, bbb ->• 111 . 
For this code N = 1.96, N = 3, R = 0.029. 



6 Construction of a uniform on the output code 

As it was pointed out in Section 1, the main advantage of the uniform on the output codes is the 
lack of the running synchronization error. 

Theorem 3. a) For any Bernoulli source and any L ^ log„ m there exists a a decipherable code, 
such that 

\<t>(Aj)\=L (J = 1,2,...) and R < N^ 1 (N -> oo) . 
b) If pi = . . . = p rn = 1/m, then R < N~ 2 for infinitely many L. 
Proof, a) Without any loss of generality we can assume that p m — mini^,^ m pi, and therefore 

- log n Pi < - log„p m (i = 1, • • • , m - 1) . (43) 

The number of blocks of length L in the output alphabet is n L . When L ^ log^ m it will exceed 
the number of symbols in the input alphabet. Therefore input symbols (i — 1, . . . , m) can be 
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mapped to different words 4> (at) of length L, which result in a decipherable code. In what follows, 
we construct a code for 

L^-\og nPm . (44) 



Consider a set 9Jt of vectors 

(h,.. 



^m — 1 5 



L-l 



• l0g„ 1 Pm \L + ^ k i l0 Sn Pi 



(45) 



where fcj = 0, 1, . . . (•£ = 1, ...,m), and — X^i 1 k« l°Sn P» ^ ^- Let M be a set of words as- 
sociated with 9Jt (see Section 4). Consider an arbitrary message {oi fc }^L r Let k (a^ ...a ir ) — 
(fci(r), . . . , k m {r)) (r = 1, 2, . . .), and - J^Zi k i( r ) lo SnPi = H r )- From JUD and it follows 
that h(l) ^ — log„p m ^ L, and for r — > oo, ft,(r) — > oo: ft.(r + 1) ^ /i(r) — log„p m . Therefore, there 
exists a maximum number r, such that h(r) ^ L. For such a number r 



L + log n p m < h(r) <L . (46) 



From (|4"6"1) we obtain 



— (L + y^fciWpi ] -1< k m (r) < — (L + VfciWft) . (47) 

From (|4"T|) it follows that fc (a^ . . . a*,.) € Wl. Therefore, according to Lemma 2, the message {di k 
begins with a word from M. So any message begins with some word in M. Since M is prefix-free, 
M = MAM, and from Lemma 12, it follows that any message can be uniquely represented by words 
from M. Therefore, we can select {Aj} — M. 
Due to (J46J) 

L + log„p m < - log„ p (Aj ) < L (j = 1,2,...). (48) 

From (|48p it follows that p (Aj) n~ L , and therefore, the number of words Aj does not exceed n L . 
Different words Aj can be mapped to different codes <fr(Aj) of length L, which results in a uniform 
on the output code. By using estimate l[48p in Jl} (see Section 2), we arrive at 

R < N-^PiAj) (-\og nPm ) < N- 1 , 
j 

which proves the first part of the theorem. 

b) Let now pi = . . . = p m = l/m. There exist infinitely many natural numbers X and L, such 
that 

L-^Xlog n m^L (49) 

(see [6j p. 3]). As words Aj we can select all possible combinations of input symbols of length X. 
They all have probability —X \og n m, and based on (|49|) their number does not exceed n L . Therefore, 
there exists a decipherable code with \<fi (Aj)\ — L. Due to (|49|) . the redundancy of such a code 



R^N- 1 '£p(A j )j [ = N-\ 

j 

since X = N = N. This completes the proof. □ 
Remark 1. It is clear that N > N >N, N >L > N. 
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7 Lower bounds for redundancy 

In the previous sections we have obtained the upper bounds for the redundancy. In conclusion we 
will provide (without proofs) the lower bounds. 

Bernoulli source is fully described by its probabilities pi, . . . , p m -i- If we use an m— 1-dimensional 
Lebesgue measure for a set of points (pi, . . . ,p m _i), then the following holds. 

Theorem 4. For almost all Bernoulli sources 

R > AT" 9 In" 8 TV (N -> oo) . 

We give a sketch of the proof. 

First we establish that vectors (k\, . . . , k m ) for which || — Y^iLi hi l°§n Pi\\ ^ s small are sufficiently 
isolated for almost all sources. Then we obtain an estimate, similar (but inverse) to the claim of 
Lemma 10. Finally we apply corollary of Theorem 1. 

Theorem 5. If for some Iq: pi ^ 1/m, then for uniform on the output code R > N^ 1 (A — > oo) . 
We give a sketch of the proof. 

First we find constants c 6 > 0, C7 > 0, such that words with \L + \og n p (Aj)\ ^ cq (L = \<j> {Aj)\) 
have a combined probability not exceeding c 7 . Then we apply the first inequality from Theorem 1. 

References 

[1] V. I. Levcnstcin, On Some Properties of Coding Systems, Dokl. Acad. Set. USSR, 140 (6) (1971) 
1274-1277. 

[2] C. A. Shannon, Mathematical Theory of Communication, Bell System Tech. J., 27 (3) 347-423, 
and 27 (4) 623-656 (1948). 

[3] V. M. Sidelnikov, On Statistical Properties of Transformations Carried out by Finite Automata, 
Cybernetics, 6 (1965) 1-14. 

[4] R. E. Krichevski, The Length of a Block Necessary for Attaining any Given Redundancy, Dokl. 
Acad. Sci. USSR, 171 (1) (1966) 37-40. 

[5] B. McMillan, Two Inequalities Implied by Unique Decipherability, IRE Trans. Infrorm. Theory, 
4 (1956) 115-116. 

[6] J. W. S. Cassels, An Introduction to Diophantine Approximation (Cambridge University Press, 
1957). 

[7] D. A. Huffman, A Method for Construction of Minimum-Redundancy Codes, IRE Trans. In- 
frorm. Theory, 40 (9) (1952) 1098-1101. 



11 



